Data Visualization is the graphic representation of data. It converts a huge dataset into small graphs, thus aids in data analysis and predictions. It is an indispensable element of data science which makes complex data more understandable and accessible. Matplotlib and Seaborn act as the backbone of data visualization through Python.
Matplotlib: It is a Python library used for plotting graphs with the help of other libraries like Numpy and Pandas. It is used for creating statical interferences and plotting 2D graphs of arrays.
Seaborn: It is also a Python library used for plotting graphs with the help of Matplotlib, Pandas, and Numpy. It is built on the roof of Matplotlib and is considered as a superset of the Matplotlib library. It helps in visualizing univariate and bivariate data. It uses beautiful themes for decorating Matplotlib graphics. It acts as an important tool in picturing Linear Regression Models. It serves in making graphs of statical Time-Series data. It eliminates the overlapping of graphs and also aids in their beautification.
If, you have x and y numeric or one of them a categorical dataset. You want to find the relationship between x and y to getting insights. Then the seaborn scatter plot function sns.scatterplot() will help.
Along with sns.scatterplot() function, seaborn have multiple functions like sns.lmplot(), sns.relplot(), sns.pariplot(). But sns.scatterplot() is the best way to create sns scatter plot.
Syntax: sns.scatterplot( x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, kwargs,** )
# Import libraries
import seaborn as sns # for Data visualization
from scipy.stats import norm # for scientific Computing
import matplotlib.pyplot as plt # for Data visualization
#It used only for read_csv in this tutorial
import pandas as pd # for data analysis
import numpy as np
If you have two numeric variable datasets and worry about what relationship between them. Then Python seaborn line plot function will help to find it. Seaborn library provides sns.lineplot() function to draw a line graph of two numeric variables like x and y.
Syntax: sns.lineplot( x=None, y=None, hue=None, size=None, style=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, dashes=True, markers=None, style_order=None, units=None, estimator=’mean’, ci=95, n_boot=1000, sort=True, err_style=’band’, err_kws=None, legend=’brief’, ax=None, kwargs, )**
#Import dataset from GitHub Seborn Repository
tips_df = sns.load_dataset("tips")
tips_df
# Firstly let's take some dictionary values
days = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
temperature = [36.6, 37, 37.7,39,40.1,43,43.4,45,45.6,40.1,44,45,46.8,47,47.8]
#create dataframe using two list days and temperature
temp_df = pd.DataFrame({"days":days, "temperature":temperature})
# Draw line plot
sns.lineplot(x = "days", y = "temperature", data=temp_df,)
plt.show() # to show graph
# Let's take data from the imported dataset
sns.lineplot(x = "total_bill", y = "tip", data = tips_df )
# Draw line plot of tip and size
sns.lineplot(x = "tip", y = "size", data = tips_df)
Till now, drawn multiple line plot using x, y and data parameters. Now, we are using multiple parameres and see the amazing output.
hue => Get separate line plots for the third categorical variable. In the above graph draw relationship between size (x-axis) and total-bill (y-axis). Now, plotting separate line plots for Female and Male category of variable sex.
style => Give style to line plot, like dashes. Different for each line plot.
palette => Give colormap for graph. You can choose anyone from bellow which is separated by a comma.
dashes => If line plot with dashes then use “False” value for no dashes otherwise “True“.
markers => Give the markers for point like (x1,y1). for markers follow matplotlib line plot blog.
legend => Give legend. The default value is “brief” but you can give “full” or “False“. False for no legend.
# Draw line plot of size and total_bill with parameters
sns.lineplot(x = "size", y="total_bill", data=tips_df, hue="sex",
style = "sex", palette = "hot", dashes = False,
markers = ["o", "<"],legend="brief",)
plt.title("Line Plot", fontsize = 20) # for title
plt.xlabel("Size", fontsize = 15) # label for x-axis
plt.ylabel("Total Bill", fontsize = 15) # label for y-axis
plt.show()
Above, the line plot shows small and its background white but you cand change it using plt.figure() and sns.set() function.
plt.figure(figsize = (16,9)) # figure size with ratio 16:9
sns.set(style='darkgrid',) # background darkgrid style of graph
# Draw line plot of size and total_bill with parameters
sns.lineplot(x = "size", y = "total_bill", data = tips_df, hue = "sex",
style = "sex", palette = "hot", dashes = False,
markers = ["o", "<"], legend="brief",)
plt.title("Line Plot", fontsize = 20)
plt.xlabel("Size", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
plt.show()
Using sns.lineplot() hue parameter, we can draw multiple line plot. In the above graphs drawn two line plots in a single graph (Female and Male) same way here use day categorical variable. Which have total 4-day categories?
plt.figure(figsize = (16,9))
sns.set(style='darkgrid',)
# Draw line plot of size and total_bill with parameters and hue "day"
sns.lineplot(x = "size", y = "total_bill", data = tips_df, hue = "day",
style = "day", palette = "hot", dashes = False,
markers = ["o", "<", ">", "^"], legend="brief",)
plt.title("Line Plot", fontsize = 20)
plt.xlabel("Size", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
plt.show()
If you have numeric type dataset and want to visualize in histogram then the seaborn histogram will help you.
Also, you are thinking about plot histogram using seaborn distplot because matplotlib plt.hist() work for the same. right?
#Plot Histogram of "size", Taken "size" from above dataset
sns.distplot(tips_df["size"])
#Plot Histogram of "tip"
sns.distplot(tips_df["tip"])
Seaborn distplot function has a bunch of parameters, which help to decorate sns histogram.
Syntax: sns.distplot( a, bins=None, hist=True, kde=True, rug=False, fit=None, hist_kws=None, kde_kws=None, rug_kws=None, fit_kws=None, color=None, vertical=False, norm_hist=False, axlabel=None, label=None, ax=None, )
a: Pass numeric type data as a Series, 1d-array, or list to plot histogram. Examples showed above.
bins: If, the dataset contains data from range 1 to 55 and your requirement to show data step of 5 in each bar.
#Plot Histogram of "total_bill" with bins parameters
sns.distplot(tips_df["total_bill"], bins=55)
# hist: If, you don’t need histogram then pass bool “True” value otherwise “False“.
#Plot Histogram of "total_bill" with hist parameters
sns.distplot(tips_df["total_bill"], hist = False)
# kde: ked stands for “kernel density estimate” to show it pass bool value “True” or “False“.
#Plot Histogram of "total_bill" with kde (kernal density estimator) parameters
sns.distplot(tips_df["total_bill"], kde=False,)
#Plot Histogram of "total_bill" with axlabel parameters
sns.distplot(tips_df["total_bill"],axlabel="Total Bill",)
# label: Give a label to the sns histogram. It doesn’t work without matplotlib.pytplot’s plt.legend() function.
#Plot Histogram of "total_bill" with label parameters
sns.distplot(tips_df["total_bill"],label="Total Bill",)
plt.title("Histogram of Total Bill") # for histogram title
plt.legend() # for label
# fit: Fit the normalize, pass value norm and kde value “False” along with that import (from scipy.stats import norm).
#Plot Histogram of "total_bill" with fit and kde parameters
sns.distplot(tips_df["total_bill"],fit=norm, kde = False) # for fit (prm) - from scipi.stats import norm
# Best way to plot a seaborn histogram
#Plot histogram in best format
plt.figure(figsize=(16,9))
sns.set() # for style
bins = [1,5,10,15,20,25,30,35,40,45,50,55]
sns.distplot(tips_df["total_bill"],bins=bins,
hist_kws = {'color':'#DC143C', 'edgecolor':'#aaff00',
'linewidth':5, 'linestyle':'--', 'alpha':0.9},
kde=False,
fit = norm,
fit_kws = {'color':'#8e00ce',
'linewidth':12, 'linestyle':'--', 'alpha':0.4},
rug = True,
rug_kws = {'color':'#0426d0', 'edgecolor':'#00dbff',
'linewidth':3, 'linestyle':'--', 'alpha':0.9},
label = "TB")
plt.xticks(bins)
plt.title("Histogram of Restorant Total Bill", fontsize = 20)
plt.xlabel("Total Bill", fontsize = 15)
plt.legend()
plt.show()
# Plot multiple seaborn histogram in single graph
plt.figure(figsize=(16,9))
sns.set() # for style
sns.distplot(tips_df["total_bill"], bins=9, label="total_bil")
sns.distplot(tips_df["tip"], bins=9, label="tip")
sns.distplot(tips_df["size"], bins=9, label = "size")
plt.legend()
If you have x and y variable dataset and want to find a relationship between them using bar graph then seaborn barplot will help you. The seaborn sns.barplot() function draws barplot conveniently.
Bar graph or Bar Plot: Bar Plot is a visualization of x and y numeric and categorical dataset variable in a graph to find the relationship between the
Syntax: sns.barplot( x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=<function mean at 0x0000026F155D02F0>, ci=95, n_boot=1000, units=None, orient=None, color=None, palette=None, saturation=0.75, errcolor=’.26′, errwidth=None, capsize=None, dodge=True, ax=None, kwargs, )**
# Plot barplot
sns.barplot()
# sns.barplot() x, y parameters
# Plot tips_df.day & tips_df.total_bill barplot
sns.barplot(x = tips_df.day, y = tips_df.total_bill)
# Pass value as DataFrame, array, or list of arrays, optional
# Pass dataset using data parameter
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
data = tips_df)
# Above days is not in order, means, x data is not in order
# Let's make it in order form
# modify the order of day
order = ['Sun', 'Thur', 'Fri', 'Sat']
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
data = tips_df, order=order)
# If you want to arrange the graph in order according to male and female
#Modify hue order
hue_order = ['Female', 'Male']
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
data = tips_df, hue_order = hue_order)
# sns.barplot() estimator parameter
# It accepts NumPy statistical function like mean, median, max, min to estimate within each categorical bin.
# estimate y variable value and then plot
# In a simple way, you want to set ymax by statistical function then use it.
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
data = tips_df, estimator= np.max)
# sns.barplot() kwargs parameter
# help to give an artistic look to the bar graph.I recommend you use more for an artistic look.
# Keyword Arguments parameter
kwargs = {'alpha':0.9, 'linestyle':':', 'linewidth':5, 'edgecolor':'k'}
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
data = tips_df,**kwargs)
Till now, we used all barplot parameter and its time to use them together because to show it the professional way. In bellow, barplot example used some other functions like:
# Example of Seaborn Barplot
sns.set()
plt.figure(figsize = (16,9))
sns.barplot(x = 'day', y = 'total_bill',
data = tips_df, alpha =1, linestyle = "-.", linewidth = 3,
edgecolor = "k")
plt.title("Barplot of Days and Total Bill", fontsize = 20)
plt.xlabel("Days", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
plt.savefig("Barplot of Days and Total Bill")
plt.show()
If, you have x and y numeric or one of them a categorical dataset. You want to find the relationship between x and y to getting insights. Then the seaborn scatter plot function sns.scatterplot() will help.
Along with sns.scatterplot() function, seaborn have multiple functions like sns.lmplot(), sns.relplot(), sns.pariplot(). But sns.scatterplot() is the best way to create sns scatter plot.
Syntax: sns.scatterplot( x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, kwargs, )**
# axlabel: Give a name to the x-axis
#Import dataset from GitHub Seborn Repository
titanic_df = sns.load_dataset("titanic")
titanic_df
# Method 1:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = "age", y = "fare", data = titanic_df)
# Method 2:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = titanic_df.age, y = titanic_df.fare)
# Method 3:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = titanic_df['age'], y = titanic_df['fare'])
# sns.scatterplot() hue parameter
# hue: Pass value as a name of variables or vector from DataFrame, optional
# scatter plot hue parameter
sns.scatterplot(x = "age", y = "fare", data = titanic_df, hue = "sex")
# Then hue_order parameter will help to change hue categorical data order.
# scatter plot hue_order parameter
sns.scatterplot(x = "age", y = "fare", data = titanic_df, hue = "sex",
hue_order= ['female', 'male'])
# sns.scatterplot() ax (Axes) parameter
# used ax.set() method to change the scatter plot x-axis, y-axis label, and title.
ax = sns.scatterplot(x = "age", y = "fare", data = titanic_df, )
ax.set(xlabel = "Age",
ylabel = "Fare",
title = "Seaborn Scatter Plot of Age and Fare")
The seaborn sns.scatterplot() allow all kwargs of matplotlib plt.scatter() like:
# scatter plot kwrgs (keyword arguments)
plt.figure(figsize=(16,9)) # figure size in 16:9 ratio
kwargs = {'edgecolor':"r",
'facecolor':"k",
'linewidth':2.7,
'linestyle':'--',
}
sns.scatterplot(x = "age", y = "fare", data = titanic_df, size = "sex", sizes = (500, 1000), alpha = .7, **kwargs)
The sns is short name use for seaborn python library. The heatmap especially uses to show 2D (two dimensional ) data in graphical format.Each data value represents in a matrix and it has a special color.
Syntax: sns.heatmap( data, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt=’.2g’, annot_kws=None, linewidths=0, linecolor=’white’, cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels=’auto’, yticklabels=’auto’, mask=None, ax=None, kwargs, )**
# Let's create 2D array
array_2d = np.linspace(1,5,12).reshape(4,3) # create numpy 2D array
print(array_2d) # print numpy array
sns.heatmap(array_2d) # create heatmap
globalWarming_df = pd.read_csv("Who_is_responsible_for_global_warming.csv")
globalWarming_df.head()
# set country name as index and drop Country Code, Indicator Name and Indicator Code
globalWarming_df = globalWarming_df.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'], axis=1).set_index('Country Name')
globalWarming_df
# Create heatmap
plt.figure(figsize=(16,9))
sns.heatmap(globalWarming_df)
# change heatmap color using cmap
plt.figure(figsize=(16,9))
sns.heatmap(globalWarming_df, cmap="coolwarm")
# If you want to see the value in graph
# annot (annotate) parameter
plt.figure(figsize=(16,9))
sns.heatmap(globalWarming_df, annot = True)
# annot_kws parameter
# Linewidth will create lines between graph
plt.figure(figsize=(16,9))
annot_kws={'fontsize':10,
'fontstyle':'italic',
'color':"k",
'alpha':1.0,
'rotation':"vertical",
'verticalalignment':'center',
'backgroundcolor':'w'}
sns.heatmap(globalWarming_df, annot = True, annot_kws= annot_kws, linewidths=4,linecolor="k")
plt.figure(figsize=(20,15))
annot_kws={'fontsize':10,
'fontstyle':'italic',
'color':"k",
'alpha':1.0,
'rotation':"vertical",
'verticalalignment':'center',
'backgroundcolor':'w'}
ax = sns.heatmap(globalWarming_df, annot = True, annot_kws= annot_kws, linewidths=4,linecolor="k")
# set seaborn heatmap title, x-axis, y-axis label and font size
ax.set(title="Heatmap", xlabel="Years", ylabel="Country Name",)
sns.set(font_scale=2) # set fontsize 2
The main goal of python heatmap is to show the correlation matrix by data visualizing. When you want to find what’s the relationship between multiple features and which features are best for Machine Learning model building. Then take correlation of that dataset and visualize by sns heatmap.
A correlaton coeffiecient is value from -1 to 1.
# sns heatmap correlation
plt.figure(figsize=(16,9))
sns.heatmap(globalWarming_df.corr(), annot = True)
# Upper triangle seaborn heatmap with mask
plt.figure(figsize=(16,9))
corr_mx = globalWarming_df.corr() # correlation matrix
matrix = np.tril(corr_mx) # take lower correlation matrix
#We used numpy ‘.tril()’ method to take the upper correlation matrix and mask attribute.
sns.heatmap(corr_mx, mask=matrix)
# Lower triangle heatmap
plt.figure(figsize=(16,9))
corr_mx = globalWarming_df.corr() # correlation matrix
matrix = np.triu(corr_mx) # take upper correlation matrix
sns.heatmap(corr_mx, mask=matrix)
# import libraries
import seaborn as sns # for data visualization
import matplotlib.pyplot as plt # for data visualization
import pandas as pd # for data analysis
# load dataset and create DataFrame ready to create heatmap
flights = sns.load_dataset("flights")
flights_df = flights.pivot("month", "year", "passengers")
# set heatmap size
plt.figure(figsize= (16,9))
# create heatmap seaborn
cbar_kws = {"shrink":.8,
'extend':'max',
'extendfrac':.2,
"drawedges":True}
sns.heatmap(flights_df.corr(), cmap="inferno", annot = True, linewidth = 2, cbar_kws=cbar_kws)
plt.title("Heatmap Correlation of 'Flights' Dataset", fontsize = 25)
plt.xlabel("Years", fontsize = 20)
plt.ylabel("Months", fontsize = 20)
plt.show()
Seaborn Pairplot uses to get the relation between each and every variable present in Pandas DataFrame. It works like a seaborn scatter plot but it plot only two variables plot and sns paiplot plot the pairwise plot of multiple features/variable in a grid format.
Syntax: sns.pairplot( data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind=’scatter’, diag_kind=’auto’, markers=None, height=2.5, aspect=1, dropna=True, plot_kws=None, diag_kws=None, grid_kws=None, size=None, )
# Let's load the another dataset
from sklearn.datasets import load_breast_cancer
cancer_dataset = load_breast_cancer()
# create datafrmae
cancer_df = pd.DataFrame(np.c_[cancer_dataset['data'],cancer_dataset['target']],
columns = np.append(cancer_dataset['feature_names'], ['target']))
cancer_df.head(6)
#plot seaborn pairplot
sns.pairplot(cancer_df)
# vars: It hep to plot pairplot accounting to required features/variable.
sns.pairplot(cancer_df, vars=['mean radius', 'mean texture','mean perimeter', 'mean area', 'mean smoothness'])
# hue: Map the third feature to get more insights. Pass string (variable name), optional
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness'], hue ='target')
# hue_order: To change the order of hue. Pass list of strings
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness'], hue ='target', hue_order = [1.0, 0.0])
# x_vars, y_vars: If you want required features on the x-axis and the y-axis the use it.
sns.pairplot(cancer_df, hue ='target', x_vars = ['mean radius', 'mean texture'], y_vars =['mean radius'])
# If we want to see the type of algorithm, we can mentione in KiND
# kind: To find the linearity. Pass {‘scatter’, ‘reg’}, optional
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
'mean smoothness'], hue ='target', kind = 'reg')